AITopics | layer type

Collaborating Authors

layer type

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

43bb733c1b62a5e374c63cb22fa457b4-Supplemental.pdf

Neural Information Processing SystemsFeb-8-2026, 05:56:06 GMT

classifier, layer type, modality, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > Puerto Rico > San Juan > San Juan (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

When Can You Get Away with Low Memory Adam?

Kalra, Dayal Singh, Kirchenbauer, John, Barkeshli, Maissam, Goldstein, Tom

arXiv.org Machine LearningMar-17-2025

Adam is the go-to optimizer for training modern machine learning models, but it requires additional memory to maintain the moving averages of the gradients and their squares. While various low-memory optimizers have been proposed that sometimes match the performance of Adam, their lack of reliability has left Adam as the default choice. In this work, we apply a simple layer-wise Signal-to-Noise Ratio (SNR) analysis to quantify when second-moment tensors can be effectively replaced by their means across different dimensions. Our SNR analysis reveals how architecture, training hyperparameters, and dataset properties impact compressibility along Adam's trajectory, naturally leading to $\textit{SlimAdam}$, a memory-efficient Adam variant. $\textit{SlimAdam}$ compresses the second moments along dimensions with high SNR when feasible, and leaves when compression would be detrimental. Through experiments across a diverse set of architectures and training scenarios, we show that $\textit{SlimAdam}$ matches Adam's performance and stability while saving up to $98\%$ of total second moments. Code for $\textit{SlimAdam}$ is available at https://github.com/dayal-kalra/low-memory-adam.

large language model, machine learning, natural language, (22 more...)

arXiv.org Machine Learning

2503.01843

Country:

North America > United States > Maryland (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > Germany > Berlin (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)

Add feedback

Reviews: Wide Feedforward or Recurrent Neural Networks of Any Architecture are Gaussian Processes

Neural Information Processing SystemsJan-24-2025, 02:51:52 GMT

This review has 2 parts. The first part is my review of the paper as a standalone paper. The second part is a meta-commentary unifying my reviews for both this paper and "Neural Tangent Kernel for Any Architecture". Part 1 This paper demonstrates that infinitely-wide architectures made from a range of building blocks are Gaussian processes. Fundamentally, the paper seems to have two core contributions. This paper is a clean, elegant and logical next step in an important research direction.

contribution, experiment, gaussian process, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Modeling & Simulation (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.41)

Add feedback

asanAI: In-Browser, No-Code, Offline-First Machine Learning Toolkit

Koch, Norman, Ghiasvand, Siavash

arXiv.org Artificial IntelligenceJan-7-2025

Machine learning (ML) has become crucial in modern life, with growing interest from researchers and the public. Despite its potential, a significant entry barrier prevents widespread adoption, making it challenging for non-experts to understand and implement ML techniques. The increasing desire to leverage ML is counterbalanced by its technical complexity, creating a gap between potential and practical application. This work introduces asanAI, an offline-first, open-source, no-code machine learning toolkit designed for users of all skill levels. It allows individuals to design, debug, train, and test ML models directly in a web browser, eliminating the need for software installations and coding. The toolkit runs on any device with a modern web browser, including smartphones, and ensures user privacy through local computations while utilizing WebGL for enhanced GPU performance. Users can quickly experiment with neural networks and train custom models using various data sources, supported by intuitive visualizations of network structures and data flows. asanAI simplifies the teaching of ML concepts in educational settings and is released under an open-source MIT license, encouraging modifications. It also supports exporting models in industry-ready formats, empowering a diverse range of users to effectively learn and apply machine learning in their projects. The proposed toolkit is successfully utilized by researchers of ScaDS.AI to swiftly draft and test machine learning ideas, by trainers to effectively educate enthusiasts, and by teachers to introduce contemporary ML topics in classrooms with minimal effort and high clarity.

artificial intelligence, asanai, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2501.06226

Country: Europe > Germany > Saxony (0.14)

Genre:

Research Report (0.40)
Instructional Material (0.34)

Industry: Education > Educational Setting (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.90)

Add feedback

PreNeT: Leveraging Computational Features to Predict Deep Neural Network Training Time

Pourali, Alireza, Boukani, Arian, Khazaei, Hamzeh

arXiv.org Artificial IntelligenceDec-26-2024

Training deep learning models, particularly Transformer-based architectures such as Large Language Models (LLMs), demands substantial computational resources and extended training periods. While optimal configuration and infrastructure selection can significantly reduce associated costs, this optimization requires preliminary analysis tools. This paper introduces PreNeT, a novel predictive framework designed to address this optimization challenge. PreNeT facilitates training optimization by integrating comprehensive computational metrics, including layer-specific parameters, arithmetic operations and memory utilization. A key feature of PreNeT is its capacity to accurately predict training duration on previously unexamined hardware infrastructures, including novel accelerator architectures. This framework employs a sophisticated approach to capture and analyze the distinct characteristics of various neural network layers, thereby enhancing existing prediction methodologies. Through proactive implementation of PreNeT, researchers and practitioners can determine optimal configurations, parameter settings, and hardware specifications to maximize cost-efficiency and minimize training duration. Experimental results demonstrate that PreNeT achieves up to 72% improvement in prediction accuracy compared to contemporary state-of-the-art frameworks.

artificial intelligence, machine learning, training time, (17 more...)

arXiv.org Artificial Intelligence

2412.15519

Country: Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Services (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Designing deep neural networks for driver intention recognition

Vellenga, Koen, Steinhauer, H. Joe, Karlsson, Alexander, Falkman, Göran, Rhodin, Asli, Koppisetty, Ashok

arXiv.org Artificial IntelligenceFeb-7-2024

Driver intention recognition studies increasingly rely on deep neural networks. Deep neural networks have achieved top performance for many different tasks, but it is not a common practice to explicitly analyse the complexity and performance of the network's architecture. Therefore, this paper applies neural architecture search to investigate the effects of the deep neural network architecture on a real-world safety critical application with limited computational capabilities. We explore a pre-defined search space for three deep neural network layer types that are capable to handle sequential data (a long-short term memory, temporal convolution, and a time-series transformer layer), and the influence of different data fusion strategies on the driver intention recognition performance. A set of eight search strategies are evaluated for two driver intention recognition datasets. For the two datasets, we observed that there is no search strategy clearly sampling better deep neural network architectures. However, performing an architecture search does improve the model performance compared to the original manually designed networks. Furthermore, we observe no relation between increased model complexity and higher driver intention recognition performance. The result indicate that multiple architectures yield similar performance, regardless of the deep neural network layer type or fusion strategy.

architecture, neural network, search strategy, (15 more...)

arXiv.org Artificial Intelligence

2402.0515

Country:

Europe > Sweden (0.04)
Oceania > Australia (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.82)

Industry:

Transportation (0.70)
Automobiles & Trucks (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Accuracy is not the only Metric that matters: Estimating the Energy Consumption of Deep Learning Models

Getzner, Johannes, Charpentier, Bertrand, Günnemann, Stephan

arXiv.org Artificial IntelligenceApr-3-2023

Published as a workshop paper at "Tackling Climate Change with Machine Learning", ICLR 2023 Modern machine learning models have started to consume incredible amounts of energy, thus incurring large carbon footprints (Strubell et al., 2019). We accomplished this, by collecting high-quality energy data and building a first baseline model, capable of predicting the energy consumption of DL models by accumulating their estimated layer-wise energies. Deep CNNs, such as VGG16 or ResNet50 already deliver great performance (Simonyan & Zisserman, 2014; He et al., 2015). Yet the increasing number of layers in such models comes at the cost of severely increased computational complexity, resulting in the need for power-hungry hardware (Thompson et al., 2020; Jin et al., 2016). An example of a model that behaves extremely poorly in this regard is a big transformer with neural architecture search (Strubell et al., 2019). Clearly, training and running these models is not just a matter of financial cost, but also environmental impact.

artificial intelligence, energy consumption, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2304.00897

Country: Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report (0.65)

Industry: Energy (0.90)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

An algorithmic framework for the optimization of deep neural networks architectures and hyperparameters

Keisler, Julie, Talbi, El-Ghazali, Claudel, Sandra, Cabriel, Gilles

arXiv.org Artificial IntelligenceFeb-27-2023

While each new learning task requires the handcrafted design of a new DNN, automated deep learning facilitates the creation of powerful DNNs. Interests are to give access to deep learning to less experienced people, to reduce the tedious tasks of managing many parameters to reach the optimal DNN, and finally, to go beyond what humans can design by creating non-intuitive DNNs that can ultimately prove to be more efficient. Optimizing a DNN means automatically finding an optimal architecture for a given learning task: choosing the operations and the connections between those operations and the associated hyperparameters. The first task is also known as Neural Architecture Search [Elsken et al., 2019], also named NAS, and the second, as HyperParameters Optimization (HPO). Most works from the literature try to tackle only one of these two optimization problems. Many papers related to NAS [White et al., 2021, Loni et al., 2020b, Wang et al., 2019b, Sun et al., 2018b, Zhong, 2020] focus on designing optimal architectures for computer vision tasks with a lot of stacked convolution and pooling layers. Because each DNN training is time-consuming, researchers tried to reduce the search space by adding many constraints preventing from finding irrelevant architectures. It affects the flexibility of the designed search spaces and limits the hyperparameters optimization.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2303.12797

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
Europe > France (0.04)

Genre: Research Report (1.00)

Industry: Energy > Power Industry > Utilities (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

💣Notes to Self: Convolutional Neural Networks(CNNs or ConvNets)

#artificialintelligenceNov-23-2022, 00:20:04 GMT

First, the name convolutional comes from the mathematical operation convolution. Cross-correlation and convolution can be confused in machine learning but as known cross-correlation does not make flip the source image or kernel weights opposite of convolution. Convolutional neural networks are the most used type of neural network for computer vision applications. CNNs are a family of deep neural networks that uses mainly convolutions to achieve the task expected. One of the most famous article about CNNs(LeNet) by Yann LeCun is "Gradient-Based Learning Applied to Document Recognition."

application, convolution, convolutional neural network, (10 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback